AITopics | recurrent attention

c2964caac096f26db222cb325aa267cb-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 04:06:20 GMT

continual learning, proceedings, ratt, (11 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
(3 more...)

Genre:

Workflow (0.46)
Research Report (0.46)

Industry: Education > Educational Setting > Continuing Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsDec-24-2025, 13:47:18 GMT

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.

continual image captioning, recurrent attention, transient task, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsAug-16-2025, 06:34:46 GMT

This is caused by the recurrent connections which amplify each small change in the weights.

continual learning, proceedings, ra tt, (12 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
(3 more...)

Genre:

Workflow (0.46)
Research Report (0.46)

Industry: Education > Educational Setting > Continuing Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Review for NeurIPS paper: RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsMay-31-2025, 17:57:20 GMT

Strengths: The paper is one of the first to study continual learning in recurrent settings and shows promising performance on the image captioning task. It proposes RATT, a novel approach for recurrent continual learning based on attentional masking, inspired by the previous HAT method. In its proposed method, three masks (a_x, a_h, and a_s) to embedding, hidden state, and vocabulary are introduced, and in its ablation study, the paper shows that all these three components are helpful to the final continual learning performance. In addition to the proposed novel approach, the paper also explores adapting weight regularization and knowledge distillation-based approaches to the recurrent continual learning problem. In its experiments, the paper shows strong results, largely outperforming simple baselines (such as fine-tuning) and previous regularization or distillation-based approaches (EWC and LwF).

continual image captioning, continual learning, recurrent attention, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

Review for NeurIPS paper: RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsMay-31-2025, 17:57:12 GMT

The paper received two accept reviews and one borderline reject [R1]. The main concern of R1 is the paper relies on simple/not the most recent approaches for both captioning and continual learning. The other reviewers and I agree to that but believe that for one of the first papers in continual learning for captioning that this is reasonable, even if it is not optimal. R1 did not respond after the rebuttal. The reviewers appreciate the the paper's contributions, including 1) First paper in continual learning in image captioning.

continual image captioning, continual learning, recurrent attention, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.64)

Add feedback

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsMay-27-2025, 10:59:30 GMT

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems.

artificial intelligence, machine learning, recurrent attention, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing SystemsOct-11-2024, 06:48:49 GMT

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems.

continual image captioning, recurrent attention, transient task, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

Long, Yinghan, Chowdhury, Sayeed Shafayet, Roy, Kaushik

arXiv.org Artificial IntelligenceOct-22-2023

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with recurrent attention. The loss caused by reducing the attention window length is compensated by aggregating information across segments with recurrent attention. SRformer leverages Recurrent Accumulate-and-Fire (RAF) neurons' inherent memory to update the cumulative product of keys and values. The segmented attention and lightweight RAF neurons ensure the efficiency of the proposed transformer. Such an approach leads to models with sequential processing capability at a lower computation/memory cost. We apply the proposed method to T5 and BART transformers. The modified models are tested on summarization datasets including CNN-dailymail, XSUM, ArXiv, and MediaSUM. Notably, using segmented inputs of varied sizes, the proposed model achieves $6-22\%$ higher ROUGE1 scores than a segmented transformer and outperforms other recurrent transformer approaches. Furthermore, compared to full attention, the proposed model reduces the computational complexity of cross attention by around $40\%$.

complexity, recurrent attention, transformer, (14 more...)

arXiv.org Artificial Intelligence

2305.1634

Country: